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DATE: Wednesday, August 02, 2006 

DB=PGPB,USPZUSOC; PLUR=NO; OP=OR 

□ L36 (((character adj 1 stringS) or character-stringS or (text adj 1 stringS) or - 

text-string$ or word or words) near match$) 

□ L35 ^^^^ ^^^^ (document or documents) near (row 

or rows)) 

O L34 13 1 and (xml near (document or documents) near (row or rows)) 6 

□ L33 13 1 and (xml near document near tree near (row or rows)) 0 

□ L32 L3 1 and ((character adj 1 string$) or character-string$ or (text adj 1 stringS) or text- ^ i 

strings or word or words) 

□ L3 1 ^^^^ document near tree near (node$ or 

parent or root or childS)) 

□ L30 715/513. ccls. 2770 

□ L29 707/lOO.ccls. 4516 

□ L28 707/6xcls. 1841 

□ L27 707/3. ccls. 7039 
G L26 707/1. ccls. 5048 
n L25 L22andmatch$ 1 

□ L24 L22 and (start or begin) 0 

^ j^23 L22 and ((character adj 1 stringS) or character-stringS or (text adj 1 stringS) or text- 
stringS or word or words) 

□ L22 L21 and xml 1 
G L2 1 L20 and (document or documents) 1 

□ L20 LI 9 and (row or rows) 1 

□ L19 20040044959.pn. 1 

□ L18 L17 and (end or finish) 1 
n LI 7 L16 and (start or begin) 1 
O L16 L15 and (row or rows) 1 

□ L 1 5 ^^^^^ ((character adj 1 string$) or character-stringS or (text adj 1 stringS) or text- 

stringS or word or words) 

□ L14 L13 andmatchS 1 

□ L13 20030014397.pn. 1 

j^j2 (^^1 (document or documents) with ((character adj 1 stringS) or character- 
stringS or (text adj 1 stringS) or text-stringS or word or words) near matchS) 
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l~l ^ J ^ (xml near (document or documents) near ((character adj 1 string$) or character- 
strings or (text adj 1 stringS) or text-string$ or word or words) near match$) 

□ LIO L9 and (tree near (document or documents)) 17 

O L9 ((character adj 1 stringS) or character-string or sting or strings or (text adj 1 .g 

string) or text-string or word or words) 

n L8 (xml near (document or documents) near match$) 76 

□ L7 (xml near (document or documents) near (row or rows)) 2 1 

□ L6 L5 and ((xml adj 1 (document or documents)) near tree) 1 5 

□ L5 (xml near document or documents near retrievS near system).ti. 46 
O L4 (xml near document or documents near retrievS near system) 5749 

DB=JPAB; PLUR=NO; OP=OR 

□ L3 (xml near document or documents near retrievS near system) 149 

□ L2 (xml near document or documents near retrievS) 441 
DB=USPT; PLUR^NO; OP=OR 

(6635089 6519617 6557043 6571292 6671853 6681370 6781609 6904562 
6938204 6941511 6463440 6523062 6662342 7069503 6397219 6466940 
6487566 68861 15 7043686 6094649 6366934 6421656 6519597 6584459 
6606620 6625596 6826555 6832219 6898609 6981002 7043472 7062709 
6635088 6832215 6931590 6966027 7020838 6707581 6810414 6901410 
6476833 6745206 6847960 7054859 7058883 6182029 6199081 6249794 
6738767 6850950).pn. (6067559 6088675 6125391 6223190 6226675 6223190 
6226675 6279006 6301614 6321265 6393456 6418448 6426778 6446113 
6507856 6529905 6532455 6535884 6538673 6542911 6542912 6569207 
6589291 6604100 6613098 6631379 6636845 6640241 6643633 6654734 
6654737 6657568 6675353 6681223 6684204 6684216 6684222 6684370 
6711554 6721727 6725231 6732095 6748569 6763343 6763499 6766298 
6766326 6766330 6772216 6782380).pn. (6785673 6785685 6785902 6792575 
6792577 6799184 6804662 6804677 6823361 6826553 6829745 6836778 
6836857 6845380 6845499 6850948 6859821 6871204 6874141 6874146 
6883137 6904432 6907455 6910029 6912538 6915304 6920607 6920608 
6928449 6928640 6931532 6934712 6934740 6938079 6941459 6941510 297 
6947945 6948133 6950985 6959415 6959416 6961760 6968500 6971096 
6978422 6981222 6986121 6990632 6993476 6993714).pn. (6993715 6996571 
6996773 6996781 7007230 7007231 7013311 7013424 7017112 7020681 
7020683 7024425 7031956 7047253 7051042 7055094 7058644 7062507 
7062708 7069504 7072896 7073123 7076729 7055093 7051040 5469354 
5821929 5748953 5845304 5465353 5649218 5842217 5892843 6003043 
6144963 4873426 4985863 5047918 5220625 5265242 5327341 5457794 
5590317 5628003 5680612 5706365 5706497 5752021 5787414 5860075).pn. 
(5940846 6035338 6108674 6169999 6278992 6327387 6437869 7010519 
7047238 7072889 6263332 6701314 6779025 7013425 6490591 6898761 
7054854 7058645 63 1 1 194 6449620 6539422 6581062 6684789 6741242 
6764009 6792428 6810136 6810429 6817008 6826597 6854120 6857013 
6886005 6964009 6996776 7003722 7016910 7024413 7035866 6083276 
65021 12 6601075 6240407 6249844 640521 1 6442595 6480865 6490564 
6507817 6507857).pn. (6578000 6591260 6604099 6643650 6658428 6675355 
6717593 6725424 6725426 6769606 6779154 6796489 6799299 6807565 



□ LI 
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6823495 6825781 6829606 6829746 6847999 6880125 6886169 6889360 

6895551 6901431 6901441 6907564 6910040 6912529 6915456 6925631 
6934908 6940953 6947932 6950984 6952800 6952802 6954896 6963869 
6964015 6976020 6986101 6990514 6990585 6990654 6996770 7010742 
7013426 7016963 7020651).pn. 

END OF SEARCH HISTORY 
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INZZ 


xml NEAR (document OR documents) 
NEAR (row OR rows) 


unrestricted 
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sii.Qvy„Me.s 
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INZZ 


xml NEAR (document OR documents) 


unrestricted 


2051 


sMw-litlfis. 
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INZZ 


2 AND tree 


unrestricted 


315 


show title? 
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INZZ 


3 AND start ADJ tag WITH end ADJ tag 


unrestricted 


0 
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INZZ 


2 AND start ADJ tag WITH end ADJ tag 


unrestricted 


0 
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3 AND tag OR tags 


unrestricted 
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show title? 
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3 AND (tag OR tags) 


unrestricted 


15 


show titles 
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string$ 


unrestricted 
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Document 

Select the documents you wish to save or order by clicking the box next to the document, 
or click the link above the document to order directly. 



locally as: j PDF document |^ search strategy: [do not include the search strategy 



Fl Select All 

1 SIOUX: an efficient index for processing structural XQueries> 

2 XML query processing using a schema-based numbering scheme, 

3 Prefix path streaming: a new clustering method for optimal holistic XML twig 

4 Discovery of maximally freauent tao tree patterns with contractible variabi 

6 A new path expression computing approach for XML data. 

7 Incremental validation of XML documents. 

8 Naming in XML documents. 

9 Efficient structural iolns on Indexed XML documents. 

11 Discovery of frequent tree structured patterns in semistructured Web document 

12 Adaptive conversion of Web content for mobile terminals. 

13 Querying XML documents made easy: nearest concept Queries. 

14 An automated approach for retrieving hierarchical data fr om HTML tables. 

15 AnXMLJloaunmL^^ 



fyl document 1 of 15 Order Document 
Inspec - 1898 to date (INZZ) 

Accession number 8l update 

0008680237 20051211. 

Title 

SIOUX: an efficient index for processing structural XQueries. 
Conference information 

Database and Expert Systems Applications. 16th International Conference, DEXA 2005. Proceedings, 
Copenhagen, Denmark, 22-26 Aug. 2005. 
Source 

Database and Expert Systems Applications. 16th International Conference, DEXA 2005. Proceedings 
(Lecture Notes In Computer Science Vol. 3588), 2005, p. 564-75, 22 refs, pp. xx+955, ISBN: 3-540- 
28566-0. Publisher: Springer-Verlag, Berlin, Germany. 
Author(s) 

5,9rrianin-Q, Yeh-JL. 

Editor(s): Apder?ep-K"Vr Debepham-Jr Waaner-R. 
Author affiliation 

Gardarin, G., Yeh, L., PRISM Lab., Univ. of Versailles, France. 
Abstract 

XML DBMSs require new Indexing techniques to efficiently process structural search and full-text 
search as integrated in XQuery. I^uch research has been done for indexing XML documents. In this 
paper we first survey some of them and suggest a classification scheme. It appears that most 
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techniques are indexing on paths in XML documents and maintain a separated index on vaiues. In 
some cases, the two indexes arc merged and/or tags are encoded. We propose a new method that 
indexes XML documents on ordered trees, i.e., two documents are in the same equivalence class is 
they have the same tree structure, with identical elements in order. We develop a simple benchmark 
to compare our method with two weil-lcnown European products. The results show that indexing on full 
trees leads to smaller index size and achieves 1 to 10 times better query performance in comparison 
with classical industrial methods that are path-based. 
Descriptors 

DAIT^BASE-INDEXING; ^||. QUgRY-PROCESSING; TREE-DAT7V-STRUCTURES: XML. 
Classification codes 

C6160 Database-manaoement-systems-DBMS *; 
C612Q Rle-oroanisatlon; 
C6130D Document-processinq'-technlques . 
Keywords 

SIOUX; structural-Xqueries-processing; full-text-search; XML-document; ordered -trees; tree- 
structure; query-performance. 
Treatment codes 

P Practical , 
Language 

English. 
Publication type 

Conference-proceedinQS . 
Publication year 

2005. 
Publication date 

20050000. 
Edition 

2005049. 
Copyright statement 

Copyright 2005 lEE. 



COPYRIGHT BY The lET, Stevenage, UK 



document 2 of 15 Order Document 
Inspec - 1898 to date (INZZ) 

Accession number & update 

0008346623 20051201. 

Title 

XML query processing using a schema-based numbering scheme. 
Conference information 

Database and XML Technologies. Second International XML Database Symposium, XSym 2004. 

Proceedings, Toronto, Ont., Canada, 29-30 Aug. 2004. 
Source 

Database and XML Technologies. Second International XML Database Symposium, XSym 2004. 
Proceedings (Lecture Notes in Comput. Sci. Vol.3186), 2004, p. 21-34, 22 refe, pp. x+234, ISBN: 3- 
540-22969-8. 

Publisher: Sprlnger-Verlag, Berlin, Germany. 
Author(s) 

Kha-D>D, Yoshikawa>M . 

Editor(s): Be»ahsene-Z, Milo-T, Rvs-M , Suciu-D, Unland-R . 
Author affiliation 

Kha, D.D., IMI Project of COE Program, Nagoya Univ. 
Abstract 

Establishing the hierarchical order among XML elements is an essential function of XML query 
processing techniques. Although most XML documents have an associated DTD or XML schema, the 
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document structure information lias not been utiiized efficiently in query processing techniques 
proposed so far. In this paper, we propose a novel technique that uses DTD or XML schema to Improve 
the disk I/O complexity of XML query processing. We present a schema-based numbering scheme 
called SPIDER that incorporates both structure information and tag names extracted from the 
document structure descriptions. Given the tag name and the Identifier of an element, SPIDER can 
determine the tag names and the identifiers of the ancestor elements without disk I/O. Based on 
SPIDER, we designed a mechanism called VIrtualJoin that significantly reduces disk 1/0 workload for 
processing XML queries. Our experiments indicated that SPIDER outperforms the structural join 
techniques Stack-Tree and PathStack in XML query processing, especially for XML queries with heavy 
join workload and large data sets. 
Descriptors 

COMPUiymQML-^^ QUER>Ma?;0(^^ RJEUmONA^ 

^iii: TREE-DATA-STRUCTURES ; ^ XML. 
Classification codes 

C6160D Relational-databases *; 

C6130D Document-processinq-techniques; 

C4240C Computational-complexity; 

£6120 ELie-orQaMsatJo^^^ 
Keywords 

XML-query-processing; schema-based-numbering-scheme; XML-documents; disk-I/O-complexity; 

SPIDER; tag-names; VirtuaUoin; Stack-Tree- technique; PathStack-technique. 
Treatment codes 

P Practical. 
Language 

English. 
Publication type 

Conference-proceedings. 
Publication year 

2004. 
Publication date 

20040000. 
Edition 

2005013. 
Copyright statement 

Copyright 2005 lEE. 



COPYRIGHT BY The lET, Stevenage, UK 



Fi document 3 of 15 Order Document 
Inspec - 1898 to date (INZZ) 

Accession number & update 

0008289033 20051201. 

Title 

Prefix path streaming: a new clustering method for optimal holistic XML twig pattern matching. 
Conference Information 

Database and Expert Systems Applications. 15th International Conference, DEXA 2004. Proceedings, 

Zaragoza, Spain, 30 Aug.-l Sept. 2004. 
Source 

Database and Expert Systems Applications. 15th International Conference, DEXA 2004. Proceedings 
(Lecture Notes In Comput. Scl. Vol.3180), 2004, p. 801-10, 8 refs, pp. xxl+972, ISBN: 3-540-22936-1. 

Publisher: Springer-Verlag, Berlin, Germany. 
Author(s) 

Editor(s): 6jliado.-±, lakJ.gmsir.M/ Traunmuller-R. 
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Author affiliation 

Ting Chen, Tok Wang Ling, Chee-Yong Chan, Sch. of Comput., Nat. Univ. of Singapore. 
Abstract 

Searching for all occurrences of a twig pattern in a XML document Is an important operation in XML 
query processing. Recently a class of holistic twig pattern matching algorithms has been proposed. 
Compared with the prior approaches, the holistic method avoids generating large intermediate results 
which do not contribute to the final answer. The method is CPU and I/O optimal when twig patterns 
only have ancestor-descendant relationships. The holistic twig-pattern matching method proposed 
eariier (N. Bruno et al. (2002)) operates on element streams which cluster all XML elements with the 
same tag name together. In this paper we introduce a clustering method called prefix path streaming 
(PPS) and new holistic twig pattern matching algorithms based on PPS. PPS clusters elements of XML 
documents according to the paths from root to the elements. This clustering approach avoids 
unnecessary scanning of irrelevant portion of XML documents. More importantly, we develop optimal 
algorithms based on PPS streaming which can process a large class of twig patterns consisting of both 
ancestor-descendant and parent-child relationships. . 
Descriptors 

PATTERN-CLUSTERINg; «|:- PATTERN-MATCHINQ; QUE RY- P R OC E SSI NG; ^ STATISTICAL- 
ANALYSIS: ^ TREE-DATA-STRUCTURES: m^mi. 
Classification codes 

C^13QP DQCMment-procegglng-techniqueg*; 
C6130M Multimedia ; 

C6120 Fjle-orgajlisatlon 
C1140Z Other-toplcs-in "Statistics; 
C61.60 Database-management-systems-DBMS. 
Keywords 

XML-document; XML-query-processing; holistic-twig-pattern -matching- algorithms; ancestor- 

descendant-reiationships; clustering-method; prefix-path-streaming; parent-chlld-relatlonships. 
Treatment codes 

P Practical . 
Language 

English. 
Publication type 

Conference-proceedings . 
Publication year 

2004. 
Publication date 

20040000. 
Edition 

2005007. 
Copyright statement 

Copyright 2005 lEE. 



COPYRIGHT BY The lET, Stevenage, UK 



Fi document 4 of 15 .Qnde.r...Bft€.ua3.e.0.t 
Inspec - 1898 to date (INZZ) 

Accession number 8i. update 

0008212238 20051201. 

Title 

Discovery of maximally frequent tag tree patterns with contractible variables from semistructured 
documents. 
Conference information 

Advances In Knowledge Discovery and Data Mining. 8th Pacific-Asia Conference, PAKDD 2004. 
Proceedings, Sydney, NSW, Australia, 26-28 May 2004. 
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Sponsor(s}: SAS; Unlv of Technol, Sydney. 
Source 

Advances in Knowledge Discovery and Data Mining. 8th Pacific-Asia Conference, PAKDD 2004. 
Proceedings (Lecture Notes in Artificial Intelligence Vol.3056), 2004, p. 133-44, 12 refs, pp. xix+713, 
ISBN: 3-540-22064-X. 
Publisher: Springer- Verlag, Berlin, Germany. 
Author(s) 

Miy^hara-T / StizukhX, S hpgdaj-T , Uchid^-T , Tak^ hashi-K / Ued^-H . 

Editor(s): Dal-Hf Srikant-R, Zhano-C . 
Author affiliation 

Miyahara, T., Fac. of Sci., Hiroshima City Univ., Japan. 
Abstract 

In order to extract meaningful and hidden knowledge from semistructured documents such as HTML 
or XML files, methods for discovering frequent patterns or common characteristics in semistructured 
documents have been more and more important. We propose new methods for discovering maximally 
frequent tree structured patterns in semistructured Web documents by using tag tree patterns as 
hypotheses. A tag tree pattern is an edge labeled tree which has ordered or unordered children and 
structured variables. An edge label is a tag or a keyword in such Web documents, and a variable can 
match an arbitrary subtree, which represents a field of a semistructured document. As a special case, 
a contractible variable can match an empty subtree, which represents a missing field in a 
semistructured document. Since semistructured documents have irregularities such as missing fields, 
a tag tree pattern with contractible variables is suited for representing tree structured patterns in 
such semistructured documents. First, we present an algorithm for generating all maximally frequent 
ordered tag tree patterns with contractible variables. Second, we give an algorithm for generating all 
maximally frequent unordered tag tree patterns with contractible variables. 
Descriptors 

^i::: PATA-MININg; 1| ; PQCUMENT-HANPLINg; INTERNET ; PATTERN-CLASSIFICATtON ; 

#: TREE- DMA-muaUB^^ XML 
Classification codes 

C617QK Knowledae-enQlneennQ^techniques *; 

C6130D Document-orocessinQ-techniques; 

C7210N Information-networks; 

C6120 meror^misaliofi. 
Keywords 

maximally-frequent-tag-tree-pattern-discovery; contractible-variable; HTML; XML; 
semistructured-Web-document; edge-labeled-tree; maximally- frequent-unordered-tag-tree- 
pattern. 
Treatment codes 

E -Pmctj^aLl. 
Language 

English. 
Publication type 

CPHf^r^nce-procggcjingg. 
Publication year 

2004. 
Publication date 

20040000. 
Edition 

2004049. 
Copyright statement 

Copyright 2004 lEE. 
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Accession number & update 
0007822103 20051201. 

Title 

An abstract grammar for XML document editing. 
Source 

Journal of KISS: Software and Applications, {J-KISS-Softw-Appl-South-Korea}, April 2003, vol. 30, no. 
3-4, p. 268-77, 15 refs, CODEN: CKNBFV, ISSN: 1229-6848. 
Publisher: Korea Inf. Sci. Soc, South Korea. 
Author(s) 

Kyom-hlfisi-SMQ., aaaa-lily.yj05-C.h.fti, C.biafi-W.9.a:.Y.Q.Q.. 
Abstract 

A document type definition (DTD) which defines tags for a document is an XML document 
grammar that defines syntactic structure of a document. An XML document keeps the rules and 
must be parsed to check validation. To parse XML document, the deterministic parsing method of 
programming language is irrelevant because it does not satisfy the definition of deterministic content 
model in element declaration. In this paper, we consider editing of a valid XML document in syntax- 
directed editing environment, and we suggest the internal stprage representations of syntax in DTD 
and their algorithms. The consequence is that a syntactic structure of textual DTD is transformed into 
graph and table structures. The table structure of DTD is interpreted as a context free grammar which 
has attribute values and is used in syntax-directed editor for XML. We called this the XML abstract 
grammar and showed generated results and examples. 
Descriptors 

ATTRIBUTE-GRAMMARS: ^||. CONTEXT-FREE-GRAMMARS ; DOCUMENT-HANDLING : 
PROGRAMMING-LANGUAGE-SEMANTICS : TREE -DATA -STRUCTURES: <^ XML . 
Classification codes 

C4210L Formai-lanQuaaes-and-computationaMinquistics *; 
C6130D Document-processinp-techniques; 
C6140P HigMeyej-Jangua^^^ 
Cfi.l2d Flie:oxg.M.i.sqtiQn. 
Keywords 

abstract-grammar; XML-document-edlting; syntax-directed-editing; document-type-definition; 

DTD; deterministic-parsing; syntactic- structure; internal-storage-representation; graph-structure; 

table-structure; reference-attribute; context-free-grammar. 
Treatment codes 

P Practical . 
Language 

Chinese. 
Publication type 

J9jJOiaLiP.a.per.. 
Availabflity 

SICI: 1229-6848(200304)30: 3/4L268:AGDE; 1-W. 
Publication year 

2003. 
Publication date 

20030400. 
Edition 

2004001. 
Copyright statement 

Copyright 2004 lEE. 
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0007816121 20051201. 

Title 

A new path expression computing approach for XML data. 

Conference information 

Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web. VLDB 
2002 Workshop EEXTT and CAISE 2002 Workshop DlWeb. Revised Papers, London, UK, Dec. 2002. 

Source 

Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web. VLDB 
2002 Workshop EEXTT and CAiSE 2002 Workshop DlWeb. Revised (Lecture Notes in Computer Science 
Vol.2590), 2003, p. 35-46, 11 refs, pp. x+258, ISBN: 3-540-00736-9. 
Publisher: Springer-Verlag, Berlin, Germany. 
Author(s) 

Jianhua-Lv, Guoren-Wanq, Jeffrey-Xu-Yu , (pQ-Yu , Hong j up-LM , Bing-Sun . 

Editor(s): Bressan-S, Chaudhri-A-B, Lee-M-L, Yu-J-X , Lacroix-Z . 
Author affiliation 

Jianhua Lv, Guoren Wang, Northeastern Univ. of China, Shenyang, China. 
Abstract 

Most query languages in XML database systems use regular path expressions (RPE) to query or extract 
data from databases and some query processing and optimization techniques have been proposed for 
RPEs. Conceptually XML documents are collections of path instances. Each path Instance should 
conform to an XML element tag sequence, called path schema. A RPE query can be written as an 
automaton that can represent a language, while path schemas can be seen as sentences. A novel RPE 
computing approach, automaton match (AM), is proposed. AM queries the RPEs by matching the 
automatons with path schemas. The experimental results show AM is quite efficient for computing RPE 
queries. 
Descriptors 

PATAPAS^-MANAQEMENT-SYSTEMS; @> FINITE-STATE-MACmNES; tl^ HYPERMEDIA-MARKUP- 

jLAW5UA6.ES; f|: QUERY-LAMC3UASJES; # :QUEiCfcER£K:ESSJMa P^IReLdAIAzSIR!^ 
Classification codes 

C6160 Database-manaQement-systems-DBMS *; 

C6130M Multimedia; 

Q422Q. Automata-theory; 

C6120 nie-pxgajiisatjon; 

CSiitOD Document-processinQ-techniques . 
Keywords 

query-language; XML-database-system; regular-path-expression; RPE-query; data-querying; data- 
extraction; query-processing-technique; query-optimlzation-technlque; XML-document; XML- 
element-tag-sequence; path-schema; automaton-match. 
Treatment codes 

P Practical ; 

Language 

English. 
Publication type 

CQpference-proceedlnqg. 
Publication year 

2003. 
Publication date 

20030000. 
Edition 

2003050. 
Copyright statement 

Copyright 2003 lEE. 
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Incremental validation of XML documents. 
Conference Information 

Database Theory - ICDT 2003. 9th International Conference. Proceedings, Siena, Italy, 8-10 Jan. 2003. 
Source 

Database Theory - ICDT 2003. 9th International Conference. Proceedings (Lecture Notes In Computer 

Science Vol.2572), 2003, p. 47-63, 25 refs, pp. xi+454, ISBN: 3-540-00323-1. 

Publisher: Springer-Verlag, Berlin, Germany. 
Author(s} 

Papakonstantinou-Y. Vlanu-V . 

Editor(s): Calvanese-D^ Lenzerini-M, Motwanl-R . 
Author affiliation 

Papakonstantinou, Y., Vianu, V., Comput. Sci. & Eng., California Univ., San Diego, CA, USA. 
Abstract 

We investigate the incremental validation of XML documents with respect to DTDs and XML schemas, 
under updates consisting of element tag renamings, insertions and deletions. DTDs are modeled as 
extended context-free grammars and XML schemas are abstracted as "specialized DTDs", allowing to 
decouple element types from element tags. For DTDs, we exhibit an 0(m log n) incremental validation 
algorithm using an auxiliary structure of size 0(n), where n is the size of the document and m the 
number of updates. For specialized DTDs, we provide an 0(m logof size 0(n). This is a significant 
improvement over brute-force revalidation from scratch. 
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XML-document-incrementai-vaiidation-algorithm; XML-schema; context- free-grammar; 
document-type-definition; DTD. 
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Publication type 
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Publication year 
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Publication date 
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Edition 
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International Conferences Proceedings, Irvine, CA, USA, Aug. 2002. 
Sponsor(s): Boeing, USA; OntoWeb, Netherlands; Telecoria Technol., USA. 
Source 

On the Move to Meaningful Internet Systems 2002. CoopIS, DOA, and ODBASE. Confederated 
International Conferences CoopIS, DOA, and ODBASE 2002 Proceedings (Lecture Notes in Computer 
Science Vol.2519), 2002, p. 1287-303, 24 refs, pp. xxiii+1367, ISBN: 3-540-00106-9. 
Publisher: Springer-Verlag, Berlin, Germany. 
Author(s) 

.ywrence-R. 

Eclitor(s): Meersman-R/ Tari-Z . 
Author affiliation 

Lawrence, R., Dept. of Comput. Sci., Iowa Univ., Iowa City, lA, USA. 
Abstract 

XML Is now an established standard for data communication and representation. There has been 
considerable work on XML querying, modeling, and type definition. However, one of the most 
important aspects of XML, standardized tag naming for conveying semantics, has been almost ignored 
by the research community. This paper argues that the naming aspects of XML are Important to 
consider and presents a naming methodology for XML tags that captures Increased context 
information. Using semantic tag names opens up the possibility of semantic querying of XML 
documents, which simplifies query formulation by reducing the reliance on path expressions. A 
semantic query facility allows XML documents with similar semantics, but organized using different 
DTDs, to be queried without modifying the original query formulation. Finally, we demonstrate an 
algorithm for converting semantic queries to structural queries by disambiguating incomplete path 
expressions. 
Descriptors 
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expressions; XML-type-definltion; standardized-tag- naming; semantics; semantic-queries; 
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2002. 
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Proceedings of the Twenty-eighth International Conference on Very Large Data Bases, 2002, p. 263- 
74, 40 refs, pp. xxvi+1118, ISBN: 1-55860-869-9. 
Publisher: Morgan Kaufmann Publishers, San Francisco, CA, USA. 
Autlior(s) 

Shu-Yao-Chien ^ Vaaena-Z, Donahui-Zhana. Tsotras-V-J , Zaniolo-C . 
Editor(s): Bernsteln-P-A, loannidia-Y-E/ Ramakrlshnan-R/ Papadias-D . 
Abstract 

Queries on XML documents typically combine selections on element contents, and, via path 
expressions, the structural relationships between tagged elements. Structural joins are used to find all 
pairs of elements satisfying the primitive structural relationships specified in the query, namely, 
parent-child and ancestor-descendant relationships. Efficient support for structural joins is thus the Icey 
to efficient implementations of XML queries. Recently proposed node numbering schemes enable the 
capturing of the XML document structure using traditional indices (such as B+-trees or R-trees). This 
paper proposes efficient structural join algorithms in the presence of tag indices. We first concentrate 
on using B+-trees and show how to expedite a structural join by avoiding collections of elements that 
do not participate in the join. We then introduce an enhancement (based on sibling pointers) that 
further improves performance. Such sibling pointers are easily Implemented and dynamically 
maintainable. We also present a structural join algorithm that utilizes R-trees. An extensive 
experimental comparison shows that the B+-tree structural joins are more robust. Furthermore, they 
provide drastic improvement gains over the current state of the art. 
Descriptors 
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Advances in Knowledge Discovery and Data Mining. 6th Pacific-Asia Conference, PAKDD 2002. 
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Source 

Advances in Knowledge Discovery and Data Mining. 6th Pacific-Asia Conference, PAKDD 2002. 
Proceedings (Lecture Notes in Artificial Intelligence Vol.2336), 2002, p. 341-55, 11 refs, pp. xiii+568, 
ISBN: 3-540-43704-5. 
Publisher: Springer-Verlag, Berlin, Germany. 
Author(s} 

Miyahara-T, Sugulci-Y, Shoudal-T, Uchida-T, TaKahashi-K, Ueda-H . 

Editor(s): Chen-M-S, Yu-P-S, Uu-B . 
Author affiliation 

Miyahara, T., Fac. of Inf. Sci., Hiroshima City Univ. 
Abstract 

Many Web documents such as HTML files and XML files have no rigid structure and are called semi- 
structured data. In general, such semi-structured Web documents are represented by rooted trees 
with ordered children. We propose a method for discovering frequent tree structured patterns in semi- 
structured Web documents by using a tag tree pattern as a hypothesis. A tag tree pattern is an 
edge labeled tree with ordered children which has structured variables. An edge label is a tag or a 
keyword in such Web documents, and a variable can be substituted by an arbitrary tree. So a tag 
tree pattern is suited to representing tree structured patterns in such Web documents. First we show 
that it is hard to compute the optimum frequent tag tree pattern. So we present an algorithm for 
generating all maximally frequent tag tree patterns and give the correctness of it. Finally, we report 
some experimental results on our algorithm. Although this algorithm is not efficient, experiments show 
that we can extract characteristic tree structured patterns in those data. 
Descriptors 
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Advances In Knowledge Discovery and Data Mining. 5th Pacific-Asia Conference, PAKDD 2001. 
Proceedings (Lecture Notes in Artificial Intelligence Vol.2035), 2001, p. 47-52, 10 refs, pp. xviil+596, 
ISBN: 3-540-41910-1. 
Publisher: Springer-Verlag, Berlin, Germany. 
Author(s) 

Miyahara-T , Shoudal-T, Uchida-T , Takahashl-K, Ueda-H . 

Editor(s): Cheuno-D, Williams-G-J, Li-Q . 
Author affiliation 

MIyahara, T., Fac. of Inf. Sci., Hiroshima City Univ. 
Abstract 

Many documents such as Web documents or XML files have no rigid structure. Such semistructured 
documents have been rapidly increasing. We propose a new method for discovering frequent tree 
structured patterns in semistructured Web documents. We consider the data mining problem of 
finding all maximally frequent tag tree patterns in semistructured data such as Web documents. A 
tag tree pattern is an edge labeled tree which has hyperedges as variables. An edge label is a tag or 
a keyword in Web documents, and a variable can be substituted by any tree. So a tag tree pattern is 
suited for representing tree structured patterns in semistructured Web documents. We present an 
algorithm for finding all maximally frequent tag tree patterns. Also we report some experimental 
results on XML documents by using our algorithm. 
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Publisher: Korea Inf. Scl. Soc, South Korea. 
Author(s) 

Sueno-chun-Kang, Kwanasue-Chung . 
Abstract 

In this paper, we propose an efficient document conversion mechanism to provide an adaptive Web 
document to mobile terminals. We also propose a RHTML (reduced HTML) to archive the adaptive tag 
reduction. A markup error correction process in the proposed adaptive document conversion 
mechanism converts an HTML (HyperText Markup Language) document Into an XML (Extensible 
Markup Language) application document. This process makes Web documents easy to handle with 
DOM (document object model) as the tree model, and it removes the hardware overhead in mobile 
terminals. Also, a tag reduction process provides the adaptive Web document with three DTDs 
(document type definitions) In the RHTML. 
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Source 

Proceedings 17th International Conference on Data Engineering, 2001, p. 321-9, 23 refs, pp. xxii+666, 
ISBN: 0-7695-1001-9. 

Publisher: IEEE Comput. Soc, Los Alamitos, CA, USA. 
Author(s) 

Schmldt-A, Kersten-M , Windhouwer-M . 
Author affiliation 

Schmidt, A., Kersten, M., Windhouwer, M., CWI, Amsterdam, Netherlands. 
Abstract 

Due to the ubiquity and popularity of XML, users often are in the following situation: they want to 
query XML documents which contain potentially interesting information but they are unaware of the 
mark-up structure that is used. For example, it is easy to guess the contents of an XML bibliography 
file whereas the mark-up depends on the methodological, cultural and personal background of the 
author(s). None the less, It is this hierarchical structure that forms the basis of XML query languages. 
We exploit the tree structure of XML documents to equip users with a powerful tool, the meet 
operator that lets them query databases with whose content they are familiar, but without requiring 
knowledge of tags and hierarchies. Our approach is based on computing the lowest common ancestor 
of nodes in the XML syntax tree: e.g., given two strings, we are looking for nodes whose offspring 
contains these two strings. The novelty of this approach is that the result type is unknown at query 
formulation time and dependent on the database instance. If the two strings are an author*s name and 
a year mainly publications of the author in this year are returned. If the two strings are numbers the 
result mostly consists of publications that have the numbers as year or page numbers. Because the 
result type of a query is not specified by the user we refer to the lowest common ancestor as nearest 
concept. We also present a running example taken from the bibliography domain, and demonstrate 
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Abstract 

Among the HTML elements, HTML tables encapsulate hierarchically structured data (hierarchical data in 
short) in a tabular structure. HTML tables do not come with a rigid schema and almost any forms of 
two-dimensional tables are acceptable according to the HTML grammar. This relaxation complicates the 



http://www.datastanvebxom/USPTOEIC/20060802_200541_al54a_31AVBFORl^ 8/2/06 



us Patent and Trademark Office for EIC - Document(s) (INZZ) 



Page 16 of 18 



process of retrieving hierarchical data from HTML tables. We propose an automated approach for 
retrieving hierarchical data from HTML tables. The proposed approach constructs the content tree of an 
HTML table, which captures the intended hierarchy of the data content of the table, without requiring 
the internal structure of the table be known beforehand. Also, the user of the content tree does not 
deal with HTML tags while retrieving the desired data from the content tree. Our approach can be 
employed by: (I) a query language written for retrieving hierarchically structured data, extracted from 
either the contents of HTML tables or other sources; (ii) a processor for converting HTML tables to XML 
documents; and (iii) a data warehousing repository for collecting hierarchical data from HTML tables 
and storing materialized views of the tables. The time complexity of the proposed retrieval approach is 
proportional to the number of HTML elements in an HTML table. 
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Abstract 

XML is fast gaining currency as the standard for Web based data transmission. But how will XML 
documents be viewed by all those non XML browsers? The author has come up with an approach that 
brings some of the benefits of XML based documents to non XML browsers. His workaround is a 
server side conversion of XML documents to JavaScript code; this code gets interpreted by the 
browser and results in a data structure roughly equivalent to the parse tree that would have been 
produced by an XML enabled browser. Transforming XML documents from tag stream to DOM 
(Document Object Model) provides a similar benefit of increased accessibility for the data consumer 
that moving data from databases to XML data sources provides for data producers. With XML 
represented at the level of the DOM, Web based consumers are freed from both the need for an XML 
parser and also from the need to have direct access to original XML data sources. Applets, scriptlets, 
ActiveX controls, and other client side components have the same programmatic access to browser 
based XML documents as they have to the rest of the browser's DOM. As It turns out, this 
workaround offers significant advantages over a pure XML approach: It's a lot faster, and the code to 
manipulate XML derived objects is cleaner and more concise. 
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